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Abstract. Sorting a Permutation by Transpositions (SPbT) is an im- 
portant problem in Bioinformtics. In this paper, we improve the running 
time of the best known approximation algorithm for SPbT. We use the 
permutation tree data structure of Feng and Zhu and improve the run- 
ning time of the 1.375 Approximation Algorithm for SPbT of Elias and 
Hartman (EH algorithm) to 0(n log n). The previous running time of 
EH algorithm was 0(n 2 ). 

1 Introduction 

Transposition is an important genome rearrangement operation and Sorting 
a Permutation by Transpositions (SPbT) is an important problem in Bioin- 
formtics. In the transposition operation, a segment is cut out of the permuta- 
tion and pasted in a different location. SPbT was first studied by Bafna and 
Pevzner [1], who discussed the first 1.5— approximation algorithm which had 
quadratic running time. Eriksson et al. [4] gave an algorithm that sorts any 
given permutation of n elements by at most |n transpositions. Later, Hartman 
and Shamir used the concept of simplified breakpoint graph to design another 
1.5— approximation algorithm with Oln 2 ) running time [8]. They further used 
the splay tree to implement this simplified algorithm and thereby reducing the 
time complexity to 0(n? yf\ogn) [8]. Finally, Elias and Hartman presented an 
1.375— approximation algorithm in [3], which is the best known approximation 
algorithm for SPbT in the literature so far. The running time of that algorithm [3] 
however is 0(n 2 ). Very recently, in [5], Feng and Zhu improved the running time 
of the 1.5— approximation algorithm of [8] to 0(n log n) by introducing and using 
a new data structure named the permutation tree. In this paper, with the help 
of the permutation tree data structure we improve the running time of the 1.375 
Approximation Algorithm for SPbT of [3] to 0(n\ogn). 

2 Preliminaries 

A transposition r = trans(i, j, k) on n — (7To...7r„_i) is an exchange of two dis- 
joint contiguous segments X = 7Tj, Kj-x and Y =itj, ...,iTk-i- Given a permu- 
tation 7r, the SPbT asks to find a sequence of transpositions to transform tt into 
the identity permutation such that the number of transpositions t is minimized. 
The transposition distance of a permutation 7r, denoted by d(7r), is the smallest 
possible value of t. The breakpoint graph G(tt) [1] is an edge-colored graph on 



2n vertices {lo, ro, l\, n, . . . , l n -i, r n -i}. For every < i < n — 1, and k + i 
are connected by a grey edge, and for every TTi, l Wi and r- Ril are connected by a 
black edge, denoted by fo. The breakpoint graph uniquely decomposes into c(n) 
cycles. A fc— cycle has fc black edges; if k is odd (resp. even), the cycle is odd 
(resp. even). Further, if fe < 3, it is short and else, long. The number of odd 
cycles is denoted by c oM (tt), and we define Ac oM (tt, r) = c oM (t.tt) - c odd (Tr), 
where t.tt denotes the result after r is applied. A transposition r is a k— move 
if Ac odd (n, t) = k. A cycle is called oriented if there is a 2-move that is ap- 
plied on three of its black edges; otherwise, it is unoriented. If G(tt) contains 
only short cycles, then, both tt and G(ir) are called simple. A permutation tt 
is 2- permutation (resp. 3 -permutation) if G(ir) is contains only 2-cycles (resp. 
3-cycles). Permutations can be made simple by inserting new elements into the 
permutations and thereby splitting the long cycles [6]. 

Two pairs of black edges (a, b) and (c, d) are said to intersect if their edges 
occur in alternated order in the breakpoint graph, i.e., in order a, c, 6, d. Cycles 
C and D intersect if there is a pair of black edges in C that intersects with a pair 
of black edges in D. A configuration of cycles is a subgraph of the breakpoint 
graph that is induced by one or more cycles. Configuration A is connected if for 
any two cycles c\ and Cfe of A there are cycles c 2 , . . . , Ck-i € A such that, for 
each i e [1, k — 1], Ci intersects with Cj+i. A component is a maximal connected 
configuration in a breakpoint graph. The size of a configuration or a component 
is the number of cycles it contains. A configuration (similarly, a component) is 
said to be unoriented if all of its cycles are unoriented. A configuration (similarly, 
a component) is small if its size is at most 8; otherwise it is big. Small components 
that do not have an ^--sequence are called bad small components [3]. 

In a configuration, an open gate is a pair of black edges of a 2-cycle or an un- 
oriented 3-cycle that does not intersect with another cycle of that configuration. 
A configuration not containing open gates is referred to as a full configuration. 
An (x, y)— sequence of transpositions on a simple permutation (for x > y) is a 
sequence of x number of transpositions, such that at least y of them are 2-moves 
and that leaves a simple permutation at the end. 

A permutation tree [5] is firstly a balanced binary tree T with root r, where 
each internal node of T has two children. The left and right children of an in- 
ternal node t are denoted by L(t) and R(t), respectively. The height of t is 
denoted by H(t); a leaf node has height zero. Secondly, a permutation tree 
must correspond to a permutation. The permutation tree corresponding to tt 
has n leaf nodes, labeled by tt\, 1T2, ■ ■ ■ , 7r„ respectively. Each node corresponds 
to an interval of n and has a value equal to the maximum number in the in- 
terval. The interval corresponding to an internal node t is be the concatenation 
of the two intervals corresponding to L(t) and R(t). The height of the permu- 
tation tree of tt is bounded by 0(log|7r|). A permutation tree (Build opera- 
tion) can be built in 0(|7r|) time. Suppose, T,t\ and ti correspond to (tt\tti . . . 
7T TO _i7r m 7r m+ i . . . TT n ), {tt^ ■ ■ ■ TT m ) and (7T m+ i7r m+2 . . . 7r„), respectively. Then, 
Join{t\,t2) returns T in 0(H(t\) — H(t 2 )) time and Split(T,m) returns t\ and 
t 2 in O(logn) time. 

3 Faster Running Time for Elias and Hartman's 
Algorithm 

The 1.375— approximation algorithm for SPbT of Elias and Hartman (referred 
to as the EH algorithm henceforth) is presented in Algorithm 1. 



Algorithm 1 EH Algorithm 



1: Transform permutation 7r into a simple permutation -it . 

2: Check if there is a (2, 2)-sequence. If so, apply it. 

3: While G(n) contains a 2-cycle, apply a 2-move. 

4: 7r is a 3-permutation. Mark all 3-cycles in G(tt). 

5: while G(n) contains a marked 3-cycle C do 

6: if C is oriented then 

7: apply a 2-move on it. 

8: else 

9: Try to sufficiently extend C eight times 
10: if sufficient configuration has been achieved then 

11: apply an —-sequence. 

12: else 

13: it must be a small component. If an —-sequence is still possible apply it. 
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14: if Applying a sequence is not possible then 
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15: This must be a bad small component. Unmark all cycles of the compo- 

nent. 

16: end if 

17: end if 
18: end if 
19: end while 

20: Now, G(n) contains only bad small components. While G(tt) contains 

atleast 8 cycles, apply an —-sequence. 
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21: While G(7r) contains a 3-cycle, apply a (3,2)-sequence. 
22: Mimic the sorting of tv using the sorting of -ir. 



Now, to achieve our goal we need to be able to use the permutation tree for 
applying (x, y)— sequence and k— move. Additionally, given a pair of black edges 
we can find, with the help of a permutation tree, another pair of black edges 
such that these two pairs intersect. Feng and Zhu [5] used the following lemma 
to find such a pair of black edges. 

Lemma 1. ([1]) Let b{ and bj are two black edges in an unoriented cycle C 
such that i < j. Let ttu — maxj <TO <j 7r m and iri = Wk + 1. Then the black edges 
bk and belong to the same cycle and the pair (6fc,6j_i) intersects the pair 
(bubj). □ 

Feng and Zhu suggested that a permutation tree can be used for query and 
transposition as follows. Assume that the permutation tree T corresponding 
to a simple permutation 7r = (tti . . . 7r„) has been constructed by procedure 
Build. Now, Procedure Query(n,i,j) finds a pair of black edges intersecting 
the pair {bi,bj} and Procedure Transposition^ , i, j, k), applies a transposition 
trans(i,j, k) on w. These two procedures can be implemented as follows. 

1. Query(n, i, j): Split T into three permutation trees, t\, ti and t 3 , correspond- 
ing to, respectively, (tti, . . . , 7Tj), (7Tj + 1, . . . , 7Tj) and (-Kj +1, . . . , ir n ). Clearly 
this can be done in O(logn) time by two splitting operations of T . The value 
of the root of t<i is the largest element (say, Hk) in the interval + 1 . . . ttj]. 
Assume that 717 = 7Tfc + 1. By Lemma 1, pair (bk, &2-1) intersects pair (6j, bj). 

2. Transposition^, i,j, k): Split T into four permutation trees ii, £2, H and £4, 
corresponding to, respectively (m, . . . , 7r,_i), (7^, . . . , 7Tj_i), (ttj, . . . , -Kk — 1) 
and (wk, . . . , 7r„). Then, join the four trees by executing J oin(J oin(J oin(t\ ,t$) , 



ti), ti). Clearly, adjusting the permutation tree T can be done by three split- 
ting and three joining operations spending O(logn) time. 

In the rest of this section we state and prove a number of lemmas concern- 
ing the running time of different steps of the the EH algorithm, achieving an 
O(nlogn) running time for the algorithm in the sequel. 

Lemma 2. Step 1 of the EH algorithm can be implemented in O(n) time. 

Proof. A permutation ir is made simple by (g, 6)-splits acting on the breakpoint 
graph G(n). A (g, 6)-split for G(n) splits one cycle into two shorter ones. Equiv- 
alently, this operation inserts a new element into 7r [7]. A breakpoint graph G(ir) 
can be transformed into G(tt) containing only 1-cycles, 2-cycles, and 3-cycles 
by a series of (g,6)-splits [8], that is, the permutation corresponding to G(n) 
beomces simple. This can be done by scanning the permutation linearly and in- 
serting a new element when necessary. Thus Step 1 can be implemented in 0{n) 
time. □ 

Lemma 3. Step 2 of the EH algorithm can be implemented in 0(n log n) time. 
Proof. To check whether a (2, 2)-sequence exists, the following sub-steps are 
executed: 

(a) We check whether there are (at least) four 2-cycles. If yes, then we are done; 
otherwise we go to the next step. 

(b) If there are two intersecting 2-cycles then a (2, 2)-sequence exists and we are 
done [3]. Otherwise we go to the following step. 

(c) If there are two nonintersecting 2-cycles, we apply a transposition on three of 
the four black edges of the two 2-cycles (check all four possibilities). Clearly, 
this is a 2-move [2]. Now, there is a (2, 2) -sequence iff in the resulting graph 
there is an oriented cycle. Otherwise we go to the following step. 

(d) In this case the permutation is a 3-permutation. Here, if all cycles are un- 
oriented, there is no (2,2)-sequence. Otherwise, for each oriented 3-cycle, we 
need to check if, after applying a 2-move on it, there is an oriented cycle in 
the resulting graph. There is a (2,2)-sequence iff the answer is yes for some 
cycle. 

Clearly, the complexity depends on Sub-steps c and d as these two cases involve 
applying the 2-move and the transpositions. Hence the result follows. □ 

Lemma 4. Steps 3 and 4 of the EH algorithm can be implemented in 0(n log n) 
time. 

Proof. We have even number of 2-cycles in the breakpoint graph for a simple 
permutation [5] . A 2-movc in Step 3 transforms two 2-cycles into a 1-cycle and 
a 3-cycle. All the 2-cycles of G(ir) can be found in linear time and be eliminated 
by at most ^ 2-moves. Since one transposition takes O(logn) time, Step 3 can 
be done in 0(n log n) time. Finally, for Step 4, all the 3-cycles can be marked 
by a linear scan of the breakpoint graph which takes at most 0{n) time. Hence 
the result follows. □ 

Lemma 5. The while loop at Step 5 of the EH algorithm can be implemented 
in 0(n log n) time. 

Proof. The loop iterates at most n times and each iteration takes O(logn) time 
as follows. Step 7 runs in O(logn) time, because, to apply a 2-move, we use the 
transposition operation on the permutation tree. Now, consider Steps 9 to 15. 
There are two types of extensions that are sufficient for extending any cycle C: 



1. Type 1: Extensions closing open gates, and 

2. Type 2: Extensions of full configurations such that the extended configura- 
tion has at most one open gate. 

To do a sufficient extension of Type 1 (add a cycle that closes an open gate) , we 
need to pick an arbitrary open gate and find another cycle that intersects with 
the open gate. For this, we query the permutation tree with the black edge (&,, bj) 
of the open gate under consideration. The query procedure in turn returns the 
intersecting pair (6fe, i) as stated above. This step takes O(logro) time. 

If the configuration is full, i.e., there are no open gates, we do sufficient 
extension of Type 2. To do this, we query the permutation tree with each pair of 
black edges of each cycle in the configuration, until we find a cycle that intersects 
with a pair. If such a cycle is found, we extend the configuration by this cycle 
to find a component of size greater than or equal to 9. As there can be atmost 
24 such pairs of black edges, this step takes O(logn) time as well. Finally, we 
apply an -^--sequence by using the transposition procedure of permutation tree 
which takes O(logn) time. Hence the result follows. □ 

Lemma 6. Steps 20 to 22 of the EH algorithm can be implemented in 0(n log n) 
time. 

Proof. In Step 20, the application of an ip- sequence takes O(logn) time and this 
step iterates at most 0(n) times. Now, applying a (3,2)-sequence is essentially 
equivalent to applying 3 transpositions such that at least 2 of them are 2-moves. 
Since, there can be no more than n 3-cycles, Step 21 also runs in O(nlogn) 
time. Hence, the result follows, since Step 22 of the EH algorithm can also be 
implemented in 0(n log n) time [5]. □ 

From the above lemmas, it is easy to see that the EH algorithm, implemented 
with permutation tree, runs in O(nlogn) time. 
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